{
 "metadata": {
  "name": "skdata_quick_intro"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Hyperopt-Sklearn Intro\n",
      "\n",
      "## Easiest possible thing\n",
      "\n",
      "As an ML researcher, I want a quick way to do model selection\n",
      "implicitly, in order to get a baseline accuracy score for a new data\n",
      "set."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Skdata-based code\n",
      "import skdata.iris.view\n",
      "from skdata.base import SklearnClassifier\n",
      "from hpsklearn.estimator import HyperoptEstimatorFactory\n",
      "\n",
      "view = skdata.iris.view.KfoldClassification(5)\n",
      "algo = SklearnClassifier(\n",
      "    HyperoptEstimatorFactory(\n",
      "        max_iter=25,  # -- consider also a time-based budget\n",
      "    ))\n",
      "mean_test_error = view.protocol(algo)\n",
      "print 'mean test error:', mean_test_error"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As an ML researcher, I want to evaluate a certain parly-defined model class, in order to do model-family comparisons. For example, PCA followed by SVM."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from hpsklearn.components import svc, pca\n",
      "\n",
      "algo_pca_svm = SklearnClassifier(\n",
      "    HyperoptEstimatorFactory(\n",
      "        max_iter=25,  # -- consider also a time-based budget\n",
      "        preprocessing=[pca('pca')],\n",
      "        classifier=svc('svc')))\n",
      "mean_test_error = view.protocol(algo_pca_svm)\n",
      "print 'mean test error:', mean_test_error"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As a domain expert, I have a particular pre-processing that I believe reveals important patterns in my data.  I would like to know how good a classifier can be built on top of my preprocessing algorithm."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def my_feature_extractor(name, *kwargs):\n",
      "    # Should return a pyll graph that evaluates to a Sklearn-compatible\n",
      "    # feature-extraction component (i.e. with a transform() method)\n",
      "    raise NotImplementedError()\n",
      "    \n",
      "algo_pca_svm = SklearnClassifier(\n",
      "    HyperoptEstimatorFactory(\n",
      "        max_iter=25,\n",
      "        # -- consider an any_preprocessing() constructor that accepts\n",
      "        #    lambdas which provide initial and final steps to all the\n",
      "        #    default pre-processing pipelines.\n",
      "        preprocessing=hp.choice('pp',[\n",
      "            [my_feature_extractor('foo-pre-pca'), pca('post-foo-pca')],\n",
      "            [my_feature_extractor('foo-alone')],\n",
      "        ]),\n",
      "        classifier=any_classifier('classif')))\n",
      "mean_test_error = view.protocol(algo_pca_svm)\n",
      "print 'mean test error:', mean_test_error"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}